Mining Massive Earth Science Data Sets for Large Scale Structure

نویسندگان

  • Amy Braverman
  • Eric Fetzer
چکیده

Abstract—The traditional way to look for large scale structure in very large observational or model generated data sets is to examine maps of means and standard deviations of parameters of interest on a coarse spatio-temporal grid. This approach is popular because it is easy to implement and understand, but unfortunately it throws away almost all of the distributional information in the data. Moreover, maps are computed for individual parameters of interest, and therefore do not retain information about relationships among two or more parameters. In this work, we use a modified data compression algorithm to produce multivariate distribution estimates for each grid cell. The algorithms optimally mediates between data reduction and fidelity loss using information-theoretic principles. Changes in these distribution estimates over time, space and resolution reflect large scale data structure. This is the basis for a data mining algorithm that characterizes those changes using a pseudo-metric for the distance between distributions. We demonstrate using data from the Atmospheric Infrared Sounder (AIRS) on board NASA’s Aqua satellite.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shaking up Seismology: Data Mining for Earthquake Detection

Seismic sensors collect massive quantities of data that contain a wealth of information about processes within the earth. Seismologists are increasingly adopting data mining and machine learning techniques to identify previously unknown earthquakes in large seismic data sets. Our new earthquake detection method, Fingerprint and Similarity Thresholding (FAST), enables waveform-similarity-based e...

متن کامل

Knowledge Discovery from Disparate Earth Data Sources

Advances in data collection and data storage technologies have made it possible to acquire massive Earth science data sets. In principle, these data sets could be transformed into great scientific discoveries. However, due to the heterogeneous nature and to the scale of the available Earth science data, traditional analysis methods are challenged and much of these data remain largely unexplored...

متن کامل

Eecient Techniques for Range Search Queries on Earth Science Data

We consider the problem of organizing large scale earth science raster data to ef ciently handle queries for identifying regions whose parameters fall within certain range values speci ed by the queries This problem seems to be critical to enabling basic data mining tasks such as determining associations between physical phenomena and spatial factors detecting changes and trends and content bas...

متن کامل

Land Cover Change Detection using Data Mining Techniques

The study of land cover change is an important problem in the Earth science domain because of its impacts on local climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Data mining and knowledge discovery techniques can aid this effort by efficiently discovering patterns that capture complex interactions between ocean temperature, air pr...

متن کامل

Knowledge Discovery From Global Remote Sensing and Climate Data: Results from Supervised and Unsupervised Data Mining

This paper describes results and lessons learned from research activities designed to develop data mining and machine learning methods for remote sensing and Earth science data sets. These data sets are acquired by Earth observing instruments onboard polar orbiting satellites, in-situ observations, and model reanalysis and provide a rich source of information related to the properties and dynam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005